A Searchable Compressed Edit-Sensitive Parsing

نویسندگان

  • Naoya Kishiue
  • Masaya Nakahara
  • Shirou Maruyama
  • Hiroshi Sakamoto
چکیده

A searchable data structure for the edit-sensitive parsing (ESP) is proposed. Given a string S, its ESP tree is equivalent to a context-free grammar G generating just S, which is represented by a DAG. Using the succinct data structures for trees and permutations, G is decomposed to two LOUDS bit strings and single array in (1+ε)n log n+ 4n+o(n) bits for any 0 < ε < 1 and the number n of variables in G. The time to count occurrences of P in S is in O( 1 ε (m log n+occc(logm log u)), whereas m = |P |, u = |S|, and occc is the number of occurrences of a maximal common subtree in ESPs of P and S. The efficiency of the proposed index is evaluated by the experiments conducted on several benchmarks complying with the other compressed indexes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating Dependency Parsing: Robust and Heuristics-Free Cross-Annotation Evaluation

Methods for evaluating dependency parsing using attachment scores are highly sensitive to representational variation between dependency treebanks, making cross-experimental evaluation opaque. This paper develops a robust procedure for cross-experimental evaluation, based on deterministic unificationbased operations for harmonizing different representations and a refined notion of tree edit dist...

متن کامل

Online Pattern Matching for String Edit Distance with Moves

Edit distance with moves (EDM) is a string-to-string distance measure that includes substring moves in addition to ordinal editing operations to turn one string to the other. Although optimizing EDM is intractable, it has many applications especially in error detections. Edit sensitive parsing (ESP) is an efficient parsing algorithm that guarantees an upper bound of parsing discrepancies betwee...

متن کامل

Browse searchable encryption schemes: Classification, methods and recent developments

With the advent of cloud computing, data owners tend to submit their data to cloud servers and allow users to access data when needed. However, outsourcing sensitive data will lead to privacy issues. Encrypting data before outsourcing solves privacy issues, but in this case, we will lose the ability to search the data. Searchable encryption (SE) schemes have been proposed to achieve this featur...

متن کامل

siEDM: an efficient string index and search algorithm for edit distance with moves

Although several self-indexes for highly repetitive text collections exist, developing an index and search algorithm with editing operations remains a challenge. Edit distance with moves (EDM) is a string-to-string distance measure that includes substring moves in addition to ordinal editing operations to turn one string into another. Although the problem of computing EDM is intractable, it has...

متن کامل

MRCSI: Compressing and Searching String Collections with Multiple References

Efficiently storing and searching collections of similar strings, such as large populations of genomes or long change histories of documents from Wikis, is a timely and challenging problem. Several recent proposals could drastically reduce space requirements by exploiting the similarity between strings in so-called referencebased compression. However, these indexes are usually not searchable an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1101.0080  شماره 

صفحات  -

تاریخ انتشار 2010